Take-home Exercise 1: Understanding and Analysing Thailand Road Accident Data

Published

September 5, 2023

Modified

September 22, 2024

1 Thailand’s Killer Roads

Road traffic injuries pose a significant public health issue in Thailand, with a high number of fatalities, injuries, and disabilities each year. These incidents have a profound impact not only on the victims and their families but also on society and the nation as a whole. According to the World Health Organization (WHO), approximately 20,000 people lose their lives in road accidents annually, equating to about 56 deaths each day.

Despite numerous government initiatives to reduce road casualties, the situation has seen minimal improvement. These issues underscore the need for a deeper understanding of road traffic accidents, which can be largely attributed to two primary factors: behavioral and environmental.

Behavioral factors, such as driver behavior and performance, are significant contributors to traffic accidents. These factors include driving style and skills, as well as direct and indirect driving behaviors. Environmental factors encompass conditions such as poor visibility during adverse weather (e.g., heavy rain or fog) and challenging road features (e.g., sharp bends, slippery slopes, and blind spots).

Previous studies have highlighted the value of Spatial Point Patterns Analysis in examining road traffic accidents. However, these studies often focus solely on either behavioral or environmental factors, neglecting the influence of temporal factors such as season, day of the week, or time of day.

In light of this, it is essential to investigate the factors affecting road traffic accidents in the Bangkok Metropolitan Region (BMR) using both spatial and spatio-temporal point patterns analysis methods. This approach aims to identify the leading causes of accidents and provide insights that can guide the development of effective policies and interventions to enhance road safety.

2 Getting Started

2.1 Loading Packages

In this exercise, we will be using the following packages:

Package Description
sf For importing, managing, and handling geospatial data
spatstat For point pattern analysis. In this hands-on exercise, it will be used to perform 1st- and 2nd-order spatial point patterns analysis and derive kernel density estimation (KDE) layer.
sfdep Used to compute spatial weights
tmap For thematic mapping
leaflet For interactive maps
tidyverse For non-spatial data wrangling
DT, knitr, kableExtra, gtsummary For building tables
plotly To create interactive plots

The following code chunk uses p_load() of pacman package to check if the aforementioned packages are installed in the computer. If they are, the libraries will be called into R.

pacman::p_load(sf, sfdep, spatstat,
               tmap, leaflet, 
               raster,
               tidyverse,
               DT, knitr, kableExtra, gtsummary,
               plotly)

2.2 The Data

File Source Screenshot

Thailand - SubnationalAdministrativeBoundaries, in SHP file format

There are 3 administrative levels in the shapefile: 0 (country), 1 (province), 2 (district), and 3 (sub-district, tambon) boundaries. We will use level 1 to filter for the Bangkok Metropolitan Region.

Humanitarian Data Exchange
Thailand Roads (OpenStreetMap Export), in SHP file format Humanitarian Data Exchange

Thailand Road Accident [2019-2022], in CSV format

This dataset provides a comprehensive statistics on recorded road accidents in Thailand from 2019 to 2022, including time location, accident type, weather condition, and road characteristics.

Kaggle

2.2.1 Traffic Accident Data

read_csv() of the readr package allows us to import csv files into R Studio.

accidents <- read_csv("data/geospatial/thai_road_accident_2019_2022.csv")
glimpse(accidents)
Rows: 81,735
Columns: 18
$ acc_code                    <dbl> 571905, 3790870, 599075, 571924, 599523, 5…
$ incident_datetime           <dttm> 2019-01-01 00:00:00, 2019-01-01 00:03:00,…
$ report_datetime             <dttm> 2019-01-02 06:11:00, 2020-02-20 13:48:00,…
$ province_th                 <chr> "ลพบุรี", "อุบลราชธานี", "ประจวบคีรีขันธ์", "เชียงใ…
$ province_en                 <chr> "Loburi", "Ubon Ratchathani", "Prachuap Kh…
$ agency                      <chr> "department of rural roads", "department o…
$ route                       <chr> "แยกทางหลวงหมายเลข 21 (กม.ที่ 31+000) - บ้านวั…
$ vehicle_type                <chr> "motorcycle", "private/passenger car", "mo…
$ presumed_cause              <chr> "driving under the influence of alcohol", …
$ accident_type               <chr> "other", "rollover/fallen on straight road…
$ number_of_vehicles_involved <dbl> 1, 1, 2, 1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, …
$ number_of_fatalities        <dbl> 0, 0, 1, 0, 0, 0, 0, 1, 3, 0, 0, 1, 0, 0, …
$ number_of_injuries          <dbl> 2, 2, 0, 1, 0, 2, 2, 0, 0, 1, 1, 0, 1, 1, …
$ weather_condition           <chr> "clear", "clear", "clear", "clear", "clear…
$ latitude                    <dbl> 14.959105, 15.210738, 12.374259, 18.601721…
$ longitude                   <dbl> 100.87346, 104.86269, 99.90795, 98.80420, …
$ road_description            <chr> "straight road", "straight road", "wide cu…
$ slope_description           <chr> "no slope", "no slope", "slope area", "no …

A quick look into the data using glimpse() of the dplyr package reveals that there are 18 variables in the data, including:

Column Name Description
acc_code The accident code or identifier.
incident_datetime The date and time of the accident occurrence.
report_datetime The date and time when the accident was reported.
province_th The name of the province in Thailand, written in Thai.
province_en The name of the province in Thailand, written in English.
agency The government agency responsible for the road and traffic management.
route The route or road segment where the accident occurred.
vehicle_type The type of vehicle involved in the accident.
presumed_cause The presumed cause or reason for the accident.
accident_type The type of nature of the accident.
number_of_vehicles_involved The number of vehicles involved in the accident.
number_of_fatalities The number of fatalities resulting from the accident.
number_of_injuries The number of injuries resulting from the accident.
weather_condition The weather condition at the time of the accident.
latitude The latitude coordinate of the accident location.
longitude The longitude coordinate of the accident location.
road_description The description of the road type or configuration where the accident occurred.
slope_description The description of the slope condition at the accident location.

Since the province_th and route columns are in Thai, I will remove them. The province column is already available in English, and the route information can be projected using the latitude and longitude columns. Additionally, I will remove the report_datetime column and use incident_datetime for our analysis.

accidents <- accidents %>% 
  select(-c(province_th, 
            route, 
            report_datetime))

The gtsummary package provides us with descriptive statistics of the dataset and also includes the amount of missingness in each variable.

Table of Variable Summary
Characteristic N N = 81,7351
acc_code 81,735 3,824,084 (3,789,460, 5,831,089)
incident_datetime 81,735 2019-01-01 to 2022-12-31 23:55:00
province_en 81,735
    Amnat Charoen
196 (0.2%)
    Ang Thong
1,125 (1.4%)
    Bangkok
6,439 (7.9%)
    buogkan
583 (0.7%)
    Buri Ram
327 (0.4%)
    Chachoengsao
1,407 (1.7%)
    Chai Nat
692 (0.8%)
    Chaiyaphum
471 (0.6%)
    Chanthaburi
1,360 (1.7%)
    Chiang Mai
3,407 (4.2%)
    Chiang Rai
858 (1.0%)
    Chon Buri
4,120 (5.0%)
    Chumphon
1,039 (1.3%)
    Kalasin
659 (0.8%)
    Kamphaeng Phet
822 (1.0%)
    Kanchanaburi
1,165 (1.4%)
    Khon Kaen
1,224 (1.5%)
    Krabi
741 (0.9%)
    Lampang
1,018 (1.2%)
    Lamphun
744 (0.9%)
    Loburi
1,036 (1.3%)
    Loei
742 (0.9%)
    Mae Hong Son
308 (0.4%)
    Maha Sarakham
832 (1.0%)
    Mukdahan
668 (0.8%)
    Nakhon Nayok
386 (0.5%)
    Nakhon Pathom
891 (1.1%)
    Nakhon Phanom
462 (0.6%)
    Nakhon Ratchasima
3,330 (4.1%)
    Nakhon Sawan
1,403 (1.7%)
    Nakhon Si Thammarat
1,363 (1.7%)
    Nan
931 (1.1%)
    Narathiwat
477 (0.6%)
    Nong Bua Lam Phu
288 (0.4%)
    Nong Khai
297 (0.4%)
    Nonthaburi
827 (1.0%)
    Pathum Thani
1,923 (2.4%)
    Pattani
395 (0.5%)
    Phangnga
328 (0.4%)
    Phatthalung
1,068 (1.3%)
    Phayao
543 (0.7%)
    Phetchabun
1,704 (2.1%)
    Phetchaburi
873 (1.1%)
    Phichit
226 (0.3%)
    Phitsanulok
800 (1.0%)
    Phra Nakhon Si Ayutthaya
1,571 (1.9%)
    Phrae
1,466 (1.8%)
    Phuket
444 (0.5%)
    Prachin Buri
809 (1.0%)
    Prachuap Khiri Khan
1,141 (1.4%)
    Ranong
140 (0.2%)
    Ratchaburi
497 (0.6%)
    Rayong
770 (0.9%)
    Roi Et
721 (0.9%)
    Sa Kaeo
699 (0.9%)
    Sakon Nakhon
973 (1.2%)
    Samut Prakan
2,241 (2.7%)
    Samut Sakhon
1,015 (1.2%)
    Samut Songkhram
524 (0.6%)
    Saraburi
1,158 (1.4%)
    Satun
321 (0.4%)
    Si Sa Ket
739 (0.9%)
    Sing Buri
448 (0.5%)
    Songkhla
1,231 (1.5%)
    Sukhothai
1,305 (1.6%)
    Suphan Buri
3,056 (3.7%)
    Surat Thani
1,983 (2.4%)
    Surin
843 (1.0%)
    Tak
1,859 (2.3%)
    Trang
784 (1.0%)
    Trat
469 (0.6%)
    Ubon Ratchathani
751 (0.9%)
    Udon Thani
946 (1.2%)
    unknown
34 (<0.1%)
    Uthai Thani
503 (0.6%)
    Uttaradit
949 (1.2%)
    Yala
343 (0.4%)
    Yasothon
504 (0.6%)
agency 81,735
    department of highways
75,304 (92%)
    department of rural roads
5,115 (6.3%)
    expressway authority of thailand
1,316 (1.6%)
vehicle_type 81,735
    4-wheel pickup truck
28,445 (35%)
    6-wheel truck
3,019 (3.7%)
    7-10-wheel truck
2,364 (2.9%)
    bicycle
257 (0.3%)
    large passenger vehicle
421 (0.5%)
    large truck with trailer
6,257 (7.7%)
    motorcycle
14,874 (18%)
    motorized tricycle
294 (0.4%)
    other
3,513 (4.3%)
    passenger pickup truck
325 (0.4%)
    pedestrian
219 (0.3%)
    private/passenger car
20,814 (25%)
    three-wheeled vehicle
28 (<0.1%)
    tractor/agricultural vehicle
63 (<0.1%)
    van
842 (1.0%)
presumed_cause 81,735
    abrupt lane change
134 (0.2%)
    aggressive driving/overtaking
12 (<0.1%)
    brake/anti-lock brake system failure
55 (<0.1%)
    cutting in closely by people/vehicles/animals
6,724 (8.2%)
    dangerous curve
61 (<0.1%)
    debris/obstruction on the road
294 (0.4%)
    disabled vehicle without proper signals
37 (<0.1%)
    disabled vehicle without proper signals/signs
5 (<0.1%)
    driving in the wrong lane
37 (<0.1%)
    driving under the influence of alcohol
1,464 (1.8%)
    driving without headlights/illumination
20 (<0.1%)
    external disturbance
2 (<0.1%)
    failure to signal enter/exit parking
43 (<0.1%)
    failure to yield right of way
133 (0.2%)
    failure to yield/signal
215 (0.3%)
    falling asleep
4,700 (5.8%)
    ignoring stop sign while leaving intersection
79 (<0.1%)
    illegal overtaking
445 (0.5%)
    inadequate visibility
13 (<0.1%)
    insufficient light
69 (<0.1%)
    internal disturbance
1 (<0.1%)
    loss of control
49 (<0.1%)
    medical condition
40 (<0.1%)
    narrow road
10 (<0.1%)
    navigation equipment failure
1 (<0.1%)
    no presumed cause related to driver
1 (<0.1%)
    no presumed cause related to road conditions
4 (<0.1%)
    no presumed cause related to vehicle conditions
1 (<0.1%)
    no road divider lines
1 (<0.1%)
    no traffic light system
1 (<0.1%)
    no traffic signs
9 (<0.1%)
    obstruction in sight
7 (<0.1%)
    other
1,576 (1.9%)
    overloaded vehicle
176 (0.2%)
    repair/construction on the road
6 (<0.1%)
    reversing vehicle
70 (<0.1%)
    road in poor condition
7 (<0.1%)
    running red lights/traffic signals
911 (1.1%)
    slippery road
85 (0.1%)
    speeding
60,373 (74%)
    straddling lanes
31 (<0.1%)
    sudden stop
48 (<0.1%)
    tailgating
181 (0.2%)
    traffic light system failure
3 (<0.1%)
    turn signal system failure
2 (<0.1%)
    unfamiliarity with the route/unskilled driving
677 (0.8%)
    using mobile phone while driving
33 (<0.1%)
    using psychoactive substances
1 (<0.1%)
    vehicle electrical system failure
11 (<0.1%)
    vehicle equipment failure
2,747 (3.4%)
    worn-out/tire blowout
127 (0.2%)
    เส้นแบ่งทิศทางจราจรชำรุด
1 (<0.1%)
    ป้ายจราจรชำรุด
1 (<0.1%)
    มึนเมาจากแอลกอฮอล์
1 (<0.1%)
accident_type 81,735
    collision at intersection corner
1,067 (1.3%)
    collision during overtaking
83 (0.1%)
    collision with obstruction (on road surface)
2,742 (3.4%)
    head-on collision (not overtaking)
3,944 (4.8%)
    other
4,188 (5.1%)
    pedestrian collision
945 (1.2%)
    rear-end collision
24,901 (30%)
    rollover/fallen on curved road
10,443 (13%)
    rollover/fallen on straight road
33,046 (40%)
    side collision
336 (0.4%)
    turning/retreating collision
40 (<0.1%)
number_of_vehicles_involved 81,735 1.00 (1.00, 2.00)
number_of_fatalities 81,735 0.00 (0.00, 0.00)
number_of_injuries 81,735 0.00 (0.00, 1.00)
weather_condition 81,735
    clear
69,943 (86%)
    dark
368 (0.5%)
    foggy
544 (0.7%)
    land slide
1 (<0.1%)
    natural disaster
47 (<0.1%)
    other
162 (0.2%)
    rainy
10,670 (13%)
latitude 81,376 14.5 (13.5, 16.6)
    NA
359
longitude 81,376 100.56 (99.89, 101.29)
    NA
359
road_description 81,735
    bridge (across river/canal)
5 (<0.1%)
    connecting to private area
463 (0.6%)
    connecting to public/commercial area
1,001 (1.2%)
    connecting to school area
121 (0.1%)
    four-way intersection
348 (0.4%)
    grade-separated intersection/ramps
184 (0.2%)
    lane-changing area
7 (<0.1%)
    merge lane
19 (<0.1%)
    motorcycle lane
1 (<0.1%)
    other
7,614 (9.3%)
    pedestrian path
1 (<0.1%)
    roundabout
11 (<0.1%)
    sharp curve
616 (0.8%)
    straight road
58,183 (71%)
    t-intersection
414 (0.5%)
    u-turn area
10 (<0.1%)
    wide curve
12,552 (15%)
    y-intersection
184 (0.2%)
    zebra crossing/pedestrian crossing
1 (<0.1%)
slope_description 81,735
    no slope
65,797 (81%)
    other
11,575 (14%)
    slope area
4,363 (5.3%)
1 Median (IQR); Range; n (%)

The code chunk performs several functions for preparing and transforming the dataset:

  1. The filter() function from the dplyr package is used to remove rows with missing or empty longitude and latitude values, ensuring that the dataset contains only valid spatial data.
  2. The sf package’s st_as_sf() function converts the data frame into a spatial object (simple feature) using the longitude and latitude columns, setting the coordinate reference system (CRS) to EPSG 4326.
  3. The spatial data is reprojected to EPSG 32647 (a local UTM projection used in Thailand) using st_transform() from the sf package.
  4. The filter() function is also applied to retain data exclusively from the Bangkok Metropolitan Region (BMR) by filtering for specific provinces.
accidents_bmr <- accidents %>% 
  
  # Removes rows with missing longitude and latitude values
  filter(!is.na(longitude) & longitude != "",
         !is.na(latitude) & latitude !="") %>% 
  
  # Converts data to an sf object using longitude and latitude 
  st_as_sf(coords = c("longitude", "latitude"),
           crs = 4326) %>%
  
  # Transforms to the projection used in Thailand
  st_transform(crs = 32647) %>% 
  
  # Filter for BMR data
  filter(province_en %in% c("Bangkok", "Nonthaburi", "Nakhon Pathom", "Pathum Thani", "Samut Prakan", "Samut Sakhon")) 

Let’s perform a check for duplicated records before we move on. The code chunk below identifies all rows in the accidents_bmr dataframe that have an exact duplicate (i.e., another row with the same values in all columns) using group_by_all().

duplicate <- accidents_bmr %>% 
  group_by_all() %>% 
  filter(n()>1) %>% 
  ungroup()
  
duplicate
Simple feature collection with 0 features and 13 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: WGS 84 / UTM zone 47N
# A tibble: 0 × 14
# ℹ 14 variables: acc_code <dbl>, incident_datetime <dttm>, province_en <chr>,
#   agency <chr>, vehicle_type <chr>, presumed_cause <chr>,
#   accident_type <chr>, number_of_vehicles_involved <dbl>,
#   number_of_fatalities <dbl>, number_of_injuries <dbl>,
#   weather_condition <chr>, road_description <chr>, slope_description <chr>,
#   geometry <GEOMETRY [m]>

Results confirm that there are no duplicated records found.

write_rds(accidents_bmr, "data/rds/accidents_bmr.rds")
accidents_bmr <- read_rds("data/rds/accidents_bmr.rds")

Let’s take a quick glance at the points:

tmap_mode('plot')

tm_shape(accidents_bmr) +
  tm_dots(col = "#800200",
          alpha=0.4, 
          size=0.05) +
  tm_layout(main.title = "Accidents",
            main.title.position = "center",
            main.title.size = 1,
            bg.color = "#E4D5C9",
            frame = F)

2.2.2 Administrative Boundary

The adminboundary dataset, which we downloaded from HDX, is in ESRI shapefile format. To use this data in an R-environment, we need to import it as an sf object. We can do this using the st_read() function of the sf package. This function reads the shapefile data and returns an sf object that can be used for further analysis.

In the code chunk below, we use %>% operator is used to pipe the output of st_read() to the st_transform() function. Since the dataset we are using is the Thailand boundary, we need to assign the standard coordinate reference system for Thailand, which is EPSG 32647. st_transform() function transforms the coordinate reference system of the sf object to 32647.

adminboundary <- st_read(dsn = "data/geospatial", 
                layer = "tha_admbnda_adm1_rtsd_20220121") %>% 
  st_transform(crs = 32647)
Reading layer `tha_admbnda_adm1_rtsd_20220121' from data source 
  `C:\kytjy\ISSS626-GAA\Take-home_Ex\Take-home_Ex01\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 77 features and 16 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 97.34336 ymin: 5.613038 xmax: 105.637 ymax: 20.46507
Geodetic CRS:  WGS 84
st_crs(adminboundary)
Coordinate Reference System:
  User input: EPSG:32647 
  wkt:
PROJCRS["WGS 84 / UTM zone 47N",
    BASEGEOGCRS["WGS 84",
        ENSEMBLE["World Geodetic System 1984 ensemble",
            MEMBER["World Geodetic System 1984 (Transit)"],
            MEMBER["World Geodetic System 1984 (G730)"],
            MEMBER["World Geodetic System 1984 (G873)"],
            MEMBER["World Geodetic System 1984 (G1150)"],
            MEMBER["World Geodetic System 1984 (G1674)"],
            MEMBER["World Geodetic System 1984 (G1762)"],
            MEMBER["World Geodetic System 1984 (G2139)"],
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ENSEMBLEACCURACY[2.0]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4326]],
    CONVERSION["UTM zone 47N",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",99,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.9996,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",500000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Navigation and medium accuracy spatial referencing."],
        AREA["Between 96°E and 102°E, northern hemisphere between equator and 84°N, onshore and offshore. China. Indonesia. Laos. Malaysia - West Malaysia. Mongolia. Myanmar (Burma). Russian Federation. Thailand."],
        BBOX[0,96,84,102]],
    ID["EPSG",32647]]

Preliminary look into the adminboundary data shows that ADM1_EN can be used to filter for the 6 regions in BMR.

glimpse(adminboundary)
Rows: 77
Columns: 17
$ Shape_Leng <dbl> 2.417227, 1.695100, 1.251111, 1.884945, 3.041716, 1.739908,…
$ Shape_Area <dbl> 0.13133873, 0.07926199, 0.05323766, 0.12698345, 0.21393797,…
$ ADM1_EN    <chr> "Bangkok", "Samut Prakan", "Nonthaburi", "Pathum Thani", "P…
$ ADM1_TH    <chr> "กรุงเทพมหานคร", "สมุทรปราการ", "นนทบุรี", "ปทุมธานี", "พระนครศรีอ…
$ ADM1_PCODE <chr> "TH10", "TH11", "TH12", "TH13", "TH14", "TH15", "TH16", "TH…
$ ADM1_REF   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ ADM1ALT1EN <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ ADM1ALT2EN <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ ADM1ALT1TH <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ ADM1ALT2TH <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ ADM0_EN    <chr> "Thailand", "Thailand", "Thailand", "Thailand", "Thailand",…
$ ADM0_TH    <chr> "ประเทศไทย", "ประเทศไทย", "ประเทศไทย", "ประเทศไทย", "ประเทศ…
$ ADM0_PCODE <chr> "TH", "TH", "TH", "TH", "TH", "TH", "TH", "TH", "TH", "TH",…
$ date       <date> 2019-02-18, 2019-02-18, 2019-02-18, 2019-02-18, 2019-02-18…
$ validOn    <date> 2022-01-22, 2022-01-22, 2022-01-22, 2022-01-22, 2022-01-22…
$ validTo    <date> -001-11-30, -001-11-30, -001-11-30, -001-11-30, -001-11-30…
$ geometry   <MULTIPOLYGON [m]> MULTIPOLYGON (((674339.8 15..., MULTIPOLYGON (…
adminboundary_bmr <- adminboundary %>% 
  select("ADM1_EN") %>% 
  filter(ADM1_EN %in% c("Bangkok", "Nonthaburi", "Nakhon Pathom", "Pathum Thani", "Samut Prakan", "Samut Sakhon"))
write_rds(adminboundary_bmr, "data/rds/adminboundary_bmr.rds")
adminboundary_bmr <- read_rds("data/rds/adminboundary_bmr.rds")

After importing the dataset, we will plot it to see how it looks using tmap.

tmap_mode('plot')

tm_shape(adminboundary_bmr)+
  tm_fill(col ="#f4e9e8", 
          alpha = 0.6) +
  tm_borders(col = "#ddafa1",
             lwd = 0.1,  
             alpha = 1) +
  tm_layout(main.title = "BMR Administrative Boundary",
            main.title.position = "center",
            main.title.size = 1,
            bg.color = "#E4D5C9",
            frame = F)

Let’s plot out the administrative boundaries together with the points of accidents:

tmap_mode('plot')

tm_shape(adminboundary_bmr) +
  tm_fill(col ="#f4e9e8", 
          alpha = 0.6) +
  tm_borders(col = "#ddafa1",
             lwd = 0.1,  
             alpha = 1) +
tm_shape(accidents_bmr) +
  tm_dots(col = "#800200",
          alpha=0.4, 
          size=0.05) +
  tm_layout(main.title = "BMR Administrative Boundary and Accidents",
            main.title.position = "center",
            main.title.size = 1,
            bg.color = "#E4D5C9",
            frame = F)

2.2.3 Road Lines

The code chunk below imports MP_SUBZONE_WEB_PL shapefile by using st_read() of sf packages.

roads <- st_read(dsn = "data/geospatial", 
                layer = "hotosm_tha_roads_lines_shp")
Reading layer `hotosm_tha_roads_lines_shp' from data source 
  `C:\kytjy\ISSS626-GAA\Take-home_Ex\Take-home_Ex01\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 2792590 features and 14 fields
Geometry type: MULTILINESTRING
Dimension:     XY
Bounding box:  xmin: 97.34457 ymin: 5.643645 xmax: 105.6528 ymax: 20.47168
CRS:           NA
roads_sf <- st_set_crs(roads, 4326) %>% 
  st_transform(crs = 32647) %>% 
  st_cast("LINESTRING")
st_crs(roads_sf)
Coordinate Reference System:
  User input: EPSG:32647 
  wkt:
PROJCRS["WGS 84 / UTM zone 47N",
    BASEGEOGCRS["WGS 84",
        ENSEMBLE["World Geodetic System 1984 ensemble",
            MEMBER["World Geodetic System 1984 (Transit)"],
            MEMBER["World Geodetic System 1984 (G730)"],
            MEMBER["World Geodetic System 1984 (G873)"],
            MEMBER["World Geodetic System 1984 (G1150)"],
            MEMBER["World Geodetic System 1984 (G1674)"],
            MEMBER["World Geodetic System 1984 (G1762)"],
            MEMBER["World Geodetic System 1984 (G2139)"],
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ENSEMBLEACCURACY[2.0]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4326]],
    CONVERSION["UTM zone 47N",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",99,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.9996,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",500000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Navigation and medium accuracy spatial referencing."],
        AREA["Between 96°E and 102°E, northern hemisphere between equator and 84°N, onshore and offshore. China. Indonesia. Laos. Malaysia - West Malaysia. Mongolia. Myanmar (Burma). Russian Federation. Thailand."],
        BBOX[0,96,84,102]],
    ID["EPSG",32647]]

Results above show that the EPSG is indicated as 32647 now.

glimpse(roads_sf)
Rows: 2,792,591
Columns: 15
$ name       <chr> "ถนนฉลองกรุง", "ซอยฉลองกรุง 1/1", NA, NA, "ถนนฉลองกรุง", NA, "…
$ name_en    <chr> "Chalong Krung Road", "Soi Chalong Krung 1/1", NA, NA, "Cha…
$ highway    <chr> "secondary", "residential", "secondary_link", "service", "s…
$ surface    <chr> "paved", NA, NA, NA, "concrete", NA, NA, "unpaved", NA, NA,…
$ smoothness <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ width      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ lanes      <chr> NA, NA, NA, NA, "2", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ oneway     <chr> "yes", NA, "yes", NA, "yes", NA, NA, NA, NA, NA, NA, NA, NA…
$ bridge     <chr> NA, NA, NA, NA, "yes", NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ layer      <chr> NA, NA, NA, NA, "1", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ source     <chr> NA, NA, NA, NA, "Bing", NA, NA, "GPS", NA, NA, NA, NA, NA, …
$ name_th    <chr> "ถนนฉลองกรุง", "ซอยฉลองกรุง 1/1", NA, NA, "ถนนฉลองกรุง", NA, "…
$ osm_id     <dbl> 1125681229, 594401607, 472283206, 594401608, 116847248, 317…
$ osm_type   <chr> "ways_line", "ways_line", "ways_line", "ways_line", "ways_l…
$ geometry   <LINESTRING [m]> LINESTRING (693686.1 151979..., LINESTRING (6933…

Next, we will look at the classification of road networks in the data.

unique(roads_sf$highway)
 [1] "secondary"      "residential"    "secondary_link" "service"       
 [5] "tertiary"       "path"           "footway"        "track"         
 [9] "unclassified"   "trunk"          "trunk_link"     "primary"       
[13] "primary_link"   "steps"          "motorway_link"  "cycleway"      
[17] "pedestrian"     "tertiary_link"  "motorway"       "construction"  
[21] "road"           "raceway"        "corridor"       "living_street" 
[25] "escape"         "proposed"       "busway"         "bridleway"     
[29] "abandoned"      "parth"          "barrier"        "paved"         

Upon reviewing the road classifications against the highway descriptions by OpenStreetMap, we observe that not all categories are relevant to our analysis, which focuses primarily on driving networks. We will keep the 13 types of road segments below for the scope of our study.

Name Description
Primary, Primary_Link Major highway linking large towns
Secondary, Secondary_Link Highways which are not part of major routes, but nevertheless form a link in the national route network
Tertiary, Tertiary_Link Roads connecting smaller settlements, and within large settlements for roads connecting local centres
Trunk, Trunk_Link Major highway that don’t meet the requirements for motorways, and their link roads (sliproads and ramps).
Motorway, Motorway_Link Highest-performance roads within a territory, generally referred to as motorways, freeways, or expressways.
Residential Road generally used for local traffic within the settlement. Primarily used for access to residential properties but may include access to some non-residential properties (e.g. a corner shop or convenience store).
Unclassified Roads with the lowest priority in the interconnecting road network
Service Provides access to a building service station, beach, campsite, industrial estate, business park, etc.

In the code chunk below, we will first specify the road classes that we want to retain. Next, we will filter roads_sf object to remove all the rows that does not have our desired highway attribute value.

highwaytypes <- c("primary",
                  "primary_link", 
                  "secondary", 
                  "secondary_link", 
                  "tertiary", 
                  "tertiary_link",
                  "trunk",
                  "trunk_link",
                  "motorway",
                  "motorway_link",
                  "residential",
                  "unclassified",
                  "service") 

roads_sf_filtered <- roads_sf %>%
  select("highway") %>% 
  filter(highway %in% highwaytypes)

unique(roads_sf_filtered$highway)
 [1] "secondary"      "residential"    "secondary_link" "service"       
 [5] "tertiary"       "unclassified"   "trunk"          "trunk_link"    
 [9] "primary"        "primary_link"   "motorway_link"  "tertiary_link" 
[13] "motorway"      

Since the roads dataset includes areas outside BMR and our analysis is focused on the region within the BMR, we will need to remove unnecessary rows. To do so, we will use st_intersection().

roads_bmr <- st_intersection(roads_sf_filtered,
                            adminboundary_bmr)
write_rds(roads_bmr, "data/rds/roads_bmr.rds")
roads_bmr <- read_rds("data/rds/roads_bmr.rds")

Now that we have scoped out the dataset, we will now plot to see the BMR road network.

tmap_mode("plot")

tm_shape(adminboundary_bmr) +
  tm_polygons() 

#tm_shape(roads_bmr) +
#  tm_lines(col="highway", palette ="viridis") +
#  tm_layout(main.title = "Road Network in Singapore",
#            main.title.position = "center",
#            main.title.size = 1.2,
#            legend.outside = TRUE,
#            frame = TRUE) +
#  tm_borders(alpha = 0.5)
par(bg = '#E4D5C9', mar = c(0,0,1,0))

plot(st_geometry(adminboundary_bmr),
     col = "#eeeae2",
     main = "Road Network in BMR")
plot(st_geometry(roads_bmr),
     add=T,
     col='#800200')

3 Data Wrangling

3.1 Deriving new variables from Accidents data

Using the incident_datetime column, we can also derive additional columns such as seasons, month, day of the week, time of the day (e.g. morning or evening peak periods). The morning rush hour is said to last from 6am to 9am and evening rush hour is reported to be from 4pm to 7pm (The Nation).

I also think it would be interesting to derive a column that indicates accidents that happened during the Songkran festival holiday. Notoriously dubbed as the Seven Deadly Days of Songkran, road accidents in Bangkok is said to surge due increased traffic from people traveling to celebrate with family.

accidents_bmr_extra <- accidents_bmr %>% 
  # Derive month, days, hour columns as well as a Songkran indicator
  mutate(
    month = month(incident_datetime,
                  label = TRUE),
    day = wday(incident_datetime,
                    label = TRUE),
    hour = hour(incident_datetime),
    songkran = ifelse(
      as_date(incident_datetime) >= as_date(paste0(year(incident_datetime), "-04-09")) &
        as_date(incident_datetime) <= as_date(paste0(year(incident_datetime), "-04-16")) &
        year(incident_datetime) %in% c(2019, 2020, 2021, 2022),
      1,
      0)) %>%

  # Derive season column and peak period indicator
  mutate(
    weektype = ifelse(
      day %in% c("Sat", "Sun"),
      "weekend",
      "weekday"
    ),
    season = ifelse(
      month %in% c("Feb", "Mar", "Apr", "May"),
      "Summer",
      ifelse(
        month %in% c("Jun", "Jul", "Aug", "Sep", "Oct"),
        "Rainy",
        "Winter"
        )
      ),
    peakperiod = ifelse(
      hour > 6 & hour < 9,
      "morningpeak",
      ifelse(
        hour > 16 & hour < 19,
        "eveningpeak",
        "non-peak"
      )
    )
    ) %>% 

  # Drop columns not required anymore
  select(-c("acc_code", 
            "incident_datetime", 
            "agency",
            "month", "day", "hour"))

The code chunk above performs the following function:

  1. The lubridate package within tidyverse is utilised to extract temporal components (month, day of the week, and hour) from the incident_datetime column using the month(), wday(), and hour() functions, respectively.
  2. The songkran indicator is created using ifelse(), along with as_date() and year(), to generate a binary variable (1 for “Yes”, 0 for “No”) indicating whether the accident occurred during the Songkran festival (April 9 to April 16) in the years 2019, 2020, 2021, or 2022.
  3. Two additional indicators are derived using ifelse():
    • season variable: classifies accidents into seasons (“Summer”, “Rainy”, or “Winter”) based on the month.
    • peakperiod variable: identifies whether the accident occurred during morning peak hours (7-9 AM), evening peak hours (5-7 PM), or non-peak times.
  4. Finally, select() from dplyr is used to keep only the necessary variables for the study.

3.2 Converting sf format into spatstat’s ppp format

In order to use the capabilities of spatstat package, a spatial dataset should be converted into an object of class planar point pattern (ppp). A ppp object contains the spatial coordinates of the points, the marks attached to the points (if any), the window in which the points were observed, and the name of the unit of length for the spatial coordinates. Thus, a single object of class ppp contains all the information required to perform spatial point pattern analysis.

In previous section, we have created sf objects of accident points. Now, we will convert them into ppp objects using as.ppp() function from spatstat package.

The code chunk below converts the accidents_bmr_lean object to a point pattern object of class ppp. st_coordinates() function is used to extract the coordinates of the accidents_bmr_lean object and st_bbox() function is used to extract the bounding box of the accidents_bmr_lean object. The resulting object accidents_ppp is a point pattern object of class ppp.

accidents_ppp <- as.ppp(st_coordinates(accidents_bmr_extra),
                        st_bbox(accidents_bmr_extra))

par(bg = '#E4D5C9',
    mar = c(0,0,1,0))

plot(accidents_ppp)

3.3 Duplicates check

We will use summary() function to get summary information of accidents_ppp object.

summary(accidents_ppp)
Planar point pattern:  12986 points
Average intensity 1.218049e-06 points per square unit

*Pattern contains duplicated points*

Coordinates are given to 10 decimal places

Window: rectangle = [591277.5, 710166.1] x [1486845.7, 1576520.5] units
                    (118900 x 89670 units)
Window area = 10661300000 square units

3.4 Jittering

To resolve the issue of duplicated points, we apply jittering with the rjitter() function. This adds a small variation to the points, preventing them from occupying the exact same location.

set.seed(1234)
accidents_ppp_jit <- rjitter(accidents_ppp, 
                             retry=TRUE, 
                             nsim=99, 
                             drop=TRUE)

The code below checks the jittered points in the chosen simulation (i.e. Simulation 99) to ensure that no duplicates remain after applying jittering.

any(duplicated(accidents_ppp_jit[["Simulation 99"]]))
[1] FALSE
par(bg = '#E4D5C9',
    mar = c(0,0,1,0))

accidents_ppp_jit <- accidents_ppp_jit[["Simulation 99"]]

plot(accidents_ppp_jit)

3.5 Creating Observation Windows

Many data types in spatstat require us to specify the region of space inside which the data were observed. This is the observation window and it is represented by an object of class owin. In this analysis, our study area is BMR, hence we will use BMR boundary as the observation window for spatial point pattern analysis.

To convert our adminboundary_bmr sf object to owin object, we will use as.owin() function from spatstat package.

adminboundary_bmr_owin <- as.owin(adminboundary_bmr)

par(bg = '#E4D5C9',
    mar = c(0,0,1,0))

plot.owin(adminboundary_bmr_owin)

summary(adminboundary_bmr_owin)
Window: polygonal boundary
single connected closed polygon with 13779 vertices
enclosing rectangle: [587893.5, 712440.5] x [1484413.7, 1579076.3] units
                     (124500 x 94660 units)
Window area = 7668990000 square units
Fraction of frame area: 0.65

3.6 Combing ppp and owin objects

In section 3.4, we have created our ppp object (accidents_ppp_jit) which represents the spatial points of accident locations. In section 3.5, we have created a owin object called (adminboundary_bmr_owin), which represent the observation window of our analysis.

The observation window adminboundary_bmr_owin and the point pattern accidents_ppp_jit can be combined, so that the custom window replaces the default rectangular extent (as seen in section 3.2).

acc_bmr_ppp = accidents_ppp_jit[adminboundary_bmr_owin]

par(bg = '#E4D5C9',
    mar = c(0,0,1,0))
plot(acc_bmr_ppp)

4 Exploratory Spatial Data Analysis

Descriptive statistics are used in point pattern analysis to summarise a point pattern’s basic properties, such as its central tendency and dispersion. The mean centre and the median centre are two often employed metrics for central tendency.

4.1 Measuring Central Tendency

4.1.1 Mean Center

Mean center is the arithmetic average of the (x, y) coordinates of all point in the study area. Similar to mean in statistical analysis, mean center is influenced to a greater degree by the outliers.

accidents_xy <- st_coordinates(accidents_bmr_extra)
accidents_mc <- apply(accidents_xy, 2, mean)

accidents_mc
        X         Y 
 668399.5 1523495.8 

The results show that the mean centre is at (668399.5, 1523495.9).

4.1.2 Median Center

Median center is the location that minimises the sum of distances required to travel to all points within an observation window. The procedure begins at a predetermined point, such as the median center, as the initial point. Then, the algorithm updates the median center’s new coordinates (x’, y’) continually until the optimal value is reached. The median center, as opposed to the mean center, offers a more reliable indicator of central tendency as it is unaffected by outliers.

accidents_medc <- apply(accidents_xy, 2, median)
accidents_medc
        X         Y 
 673446.1 1520755.0 

Based on the results, the median centre of accidents is (673446.1, 1520755.0).

The mean centers and median centers are similar. This may imply that the distribution of the data is relatively balanced and there is not a significant difference in the spatial patterns between the accident points. Additionally, this indicates that both the mean center and median center are effective measures for analyzing the central tendency of the data in this context.

par(bg = '#E4D5C9', mar = c(0,0,1,0))

plot(st_geometry(adminboundary_bmr), 
     col='#eeeae2')

plot(accidents_xy, 
     add = T, cex=0.7, pch = 21,
     main="Mean and Median Centers of Accidents in BMR")
points(cbind(accidents_mc[1], accidents_mc[2]), pch='*', col='#f5347f', cex=3)
points(cbind(accidents_medc[1], accidents_medc[2]), pch='*', col='#bb8bdc', cex=3)

4.2 Measuring Dispersion

4.2.1 Standard Distance

Standard distances are defined similarly to standard deviations. This indicator measures how dispersed a group of points is around its mean center.

accidents_sd <- sqrt(sum((accidents_xy[,1] - accidents_mc[1])^2 +
                           (accidents_xy[,2] - accidents_mc[2])^2) 
                     / nrow(accidents_xy))

accidents_sd
[1] 27235.14

4.2.2 Plotting Standard Distance

In this section, we will create bearing circle of accident points using the standard distance value we have calculated earlier. This can provide visual representation of the dispersion.

par(bg = '#E4D5C9',
    mar = c(0,0,1,0))

plot(st_geometry(adminboundary_bmr), col='#eeeae2', main="Standard Distance of Accidents in BMR")

plot(accidents_xy, cex=.5)
points(cbind(accidents_mc[1], accidents_mc[2]), pch='*', col='#f5347f', cex=3)

bearing <- 1:360 * pi/180
cx <- accidents_mc[1] + accidents_sd * cos(bearing)
cy <- accidents_mc[2] + accidents_sd * sin(bearing)
circle <- cbind(cx, cy)
lines(circle, col='#f5347f', lwd=2)

4.3 Spatial Randomness Test

Clark and Evans (1954) give a very simple test of spatial randomness called Clark and Evans aggregation index (R). It is the ratio of the observed mean nearest neighbour distance in the pattern to that expected for a Poisson point process of the same intensity. R-value >1 suggests ordering, while R-value <1 suggests clustering.

We will perform the Clark-Evans test of aggregation for a spatial point pattern by using clarkevans.test() of statspat.

The test hypotheses are:

\(H_0\) = The distribution of accident points are randomly distributed.

\(H_1\) = The distribution of accidents points are not randomly distributed.

The 95% confidence interval will be used.

set.seed(1234)
clarkevans.test(acc_bmr_ppp,
                correction="none",
                clipregion="adminboundary_bmr_owin",
                alternative=c("clustered"),
                nsim=99)

    Clark-Evans test
    No edge correction
    Z-test

data:  acc_bmr_ppp
R = 0.22579, p-value < 2.2e-16
alternative hypothesis: clustered (R < 1)

The Clark-Evans test for the accident points shows an R-value of 0.22579, which is less than 1. This indicates a clustered distribution. The p-value is less than 2.2e-16, which is extremely small and less than the significance level of 0.05. This means that we will reject the null hypothesis (\(H_0\)) and accept the alternative hypothesis (\(H_1\)).

Therefore, the statistical inference from this test is that the accident points are not randomly distributed but are clustered. This suggests that there may be underlying factors influencing the spatial distribution of these points.

5 First-Order Spatial Point Pattern Analysis

After data wrangling is complete, we will start to perform first-order spatial point pattern analysis using functions from spatstat package. First-order properties concern the characteristics of individual point locations and their variations of their density across space and are mostly addressed by density-based techniques, such as quadrant analysis and kernel density estimation.

Investigation of the intensity of a point pattern is one of the first and most important steps in point pattern analysis. If the point process has an intensity function λ(u), this function can be estimated non-parametrically by kernel estimation. Kernel estimation allows for smoothing of the probability density estimation of a random variable (in this analysis a point event) based on kernels as weights.

5.1 Rescaling acc_bmr_ppp

The EPSG: 32647 Coordinate References System uses meters as the standard unit. Thus, acc_bmr_ppp prepared earlier is also in metres. However, we will need to convert the measuring unit from metre to kilometeres when calculating the kernel density estimators for entirety of BMR because kilometers provide a more appropriate scale for analysing large areas.

acc_bmr_ppp.km <- rescale(acc_bmr_ppp, 1000, "km")

5.2 Computing Default Kernel Density Estimation

Kernel Destiny Estimation (KDE) generates a surface (raster) representing the estimated distribution of point events over the observation window. Each cell in the KDE layer carries a value representing the estimated density of that location.

KDE allows us to identify traffic accident hot spots, which is an essential step for the appropriate allocation of resources for safety improvements. To do that, we use density.ppp() from the spatstat package.

Show the code
par(bg = '#E4D5C9',
    mar = c(0,0,1,0))

kde_default <- density(acc_bmr_ppp.km)
plot(kde_default, main = "Default Density KDE")
contour(kde_default, add=TRUE)

The key argument to pass to the density() function for point pattern objects is sigma, which determines the smoothing bandwidth of the kernel. A smaller bandwidth reveals more details, creating peaks and valleys, while a larger bandwidth smooths the distribution but with less precision. If the bandwidth is too small, the result can look overly noisy, and if it’s too large, important details may be lost due to over-smoothing.

By default, when the sigma value isn’t provided, a bandwidth is determined by a simple rule of thumb that depends only on the size of the window. This default setting might not always give the desired result. In the KDE plot we generated, there’s evidence of over-smoothing, where only one large spatial cluster is visible, potentially hiding smaller clusters or important details.

To address this, we can manually set the bandwidth using the sigma argument or choose a different kernel function through the kernel argument. This will help create more intuitive and detailed KDE maps that better capture the structure of the data.

5.3 KDE Layers with Fixed Bandwidth

5.3.1 Computing Fixed Bandwidths Using Different Bandwidth Selection Methods

4 automatic bandwidth calculation methods are available:

  • bw.diggle(): In the Cross Validated Bandwidth Selection, the bandwidth is chosen to minimise the mean-square error criterion. The mean-square error is a measure of the average of the squares of the errors - that is, the average squared difference between the estimated values and the actual value.

  • bw.CvL(): In the Cronie and van Lieshout’s Criterion for Bandwidth Selection, the bandwidth is chosen to minimise the discrepancy between the area of the observation window and the sum of reciprocal estimated intensity values at the points of the point process. This method aims to choose a bandwidth that best represents the underlying point process, taking into account both the observed points and the area they occupy.

  • bw.scott(): In the Scott’s Rule for Bandwidth Selection, the bandwidth is computed by the rule of thumb where the bandwidth is proportional to \(n^{-1/(d+4)}\), where n is the number of points and d is the number of spatial dimensions. This method is useful for estimating gradual trend.

  • bw.ppl(): In the Likelihood Cross Validation Bandwidth Selection, the bandwidth is chosen to maximise the point process likelihood.

bw_diggle <- bw.diggle(acc_bmr_ppp.km)
bw_diggle
     sigma 
0.04025879 
bw_CvL <- bw.CvL(acc_bmr_ppp.km)
bw_CvL
   sigma 
11.55349 
bw_scott <- bw.scott(acc_bmr_ppp.km)
bw_scott
 sigma.x  sigma.y 
4.527996 3.318169 
bw_ppl <- bw.ppl(acc_bmr_ppp.km)
bw_ppl
    sigma 
0.3493275 
bw_scott_iso <- bw.scott.iso(acc_bmr_ppp.km)
bw_scott_iso 
   sigma 
3.876165 
par(bg = '#E4D5C9',
    #mar = c(0,0,1,0),
    mfrow = c(1,2))

plot(bw_diggle, xlim=c(0.0,0.06), ylim=c(-160,100))
plot(bw_CvL)

par(bg = '#E4D5C9',
    #mar = c(0,0,1,0),
    mfrow = c(1,2))

plot(bw_scott, main="bw_scott")

plot(bw_ppl,
     xlim=c(-1,5), 
     ylim=c(00,30000))

5.3.2 Choosing Fixed-Bandwidth KDE

kde_diggle <- density(acc_bmr_ppp.km, bw_diggle)
kde_CvL <- density(acc_bmr_ppp.km, bw_CvL)
kde_scott <- density(acc_bmr_ppp.km, bw_scott)
kde_ppl <- density(acc_bmr_ppp.km, bw_ppl)

par(bg = '#E4D5C9',
    mar = c(1,1,1,1.5),
    mfrow = c(2,2))

plot(kde_diggle,main = "kde_diggle")
plot(kde_CvL,main = "kde_CvL")
plot(kde_scott,main = "kde_scott")
plot(kde_ppl,main = "kde_ppl")

Next, we will try to plot histograms to compare the distribution of KDE values obtained from density() function using different bandwidth selection methods.

par(bg = '#E4D5C9',
    mar = c(2,2,2,2),
    mfrow = c(2,2))

hist(kde_diggle,main = "kde_diggle")
hist(kde_CvL,main = "kde_CvL")
hist(kde_scott,main = "kde_scott")
hist(kde_ppl,main = "kde_ppl")

par(bg = '#E4D5C9',
    mar = c(0,0,1,0))

kde_fixed_scott <- density(acc_bmr_ppp.km, bw_scott)
plot(kde_fixed_scott,
     main = "Fixed-Bandwidth KDE for Accident Points (Using bw_scott)")
contour(kde_fixed_scott, add=TRUE)

Visual inspection reveals that the bandwidth suggested by the bw_scott method causes noticeable over-smoothing. Although automatic bandwidth selection offers a solid initial estimate, fine-tuning is often required to achieve more accurate results.

To counteract the over-smoothing, we will apply a simple adjustment by reducing the bandwidth by half. This reduction should help capture more detailed patterns in the data and avoid the loss of important spatial information.

par(bg = '#E4D5C9',
    mar = c(0,0,1,0))

kde_fixed_scott <- density(acc_bmr_ppp.km, bw_scott/2)
plot(kde_fixed_scott,
     main = "Fixed-Bandwidth KDE for Accident Points (Using bw_scott)")
contour(kde_fixed_scott, add=TRUE)

From the plot above, it seems that reducing the bandwidth (which shrinks the point cluster buffers) has lessened the over-smoothing effect while still clearly illuminating the accident hotspot areas.

5.3.3 Kernel Methods

There are 4 types of kernels in density.ppp(), namely: Gaussian, Epanechnikov, Quartic and Dics. A Gaussian kernel is set automatically.

The code chunk below will be used to compute three more kernel density estimations by using these three kernel function.

kde_fixed_scott.gaussian <- density(acc_bmr_ppp.km, 
                          sigma=bw_scott, 
                          edge=TRUE, 
                          kernel="gaussian")


kde_fixed_scott.epanechnikov <- density(acc_bmr_ppp.km, 
                          sigma=bw_scott, 
                          edge=TRUE, 
                          kernel="epanechnikov")
   
kde_fixed_scott.quartic <- density(acc_bmr_ppp.km, 
                          sigma=bw_scott, 
                          edge=TRUE, 
                          kernel="quartic")
       
   
kde_fixed_scott.disc <- density(acc_bmr_ppp.km, 
                          sigma=bw_scott, 
                          edge=TRUE, 
                          kernel="disc")
         
par(bg = '#E4D5C9',
    mar = c(0,0,1,0),
    mfrow = c(2,2))

plot(kde_fixed_scott.gaussian, main="Gaussian")
plot(kde_fixed_scott.epanechnikov, main="Epanechnikov")
plot(kde_fixed_scott.quartic, main="Quartic")
plot(kde_fixed_scott.disc, main="Disc")

5.4 KDE Layers with Spatially Adaptive Bandwidth

The bandwidth of a kernel estimator can be either fixed across the entire mapping area or adaptive to suit in local situations. The fixed bandwidth method explored in our earlier analysis is said to be sensitive to highly skewed distribution of spatial point patterns over geographical units for example urban versus rural. One way to overcome this problem is by using adaptive bandwidth instead.

In the section below, we compare the fixed with adaptive bandwidth-based KDE, and how they were able to detect accident hot spots.

adaptive.density() of Spatstat offers 3 estimation methods:

  1. method = “voronoi”: which estimates the intensity using the Voronoi-Dirichlet tessellation
  2. method = “kernel”: which estimates the intensity using a variable-bandwidth kernel estimator
  3. `method = “nearest”: which computes an estimate of the intensity function of a point pattern dataset using the distance from each spatial location to the kth nearest points
kde_adaptive_vd <- adaptive.density(acc_bmr_ppp.km, 
                                    method = "voronoi")
Show the code
par(bg = '#E4D5C9',
    mar = c(0,0,1,0))

plot(kde_adaptive_vd,
     main = "Voronoi-Dirichlet Adaptive Density Estimate")

kde_adaptive_kernel <- adaptive.density(acc_bmr_ppp.km, 
                                        method = "kernel")
Show the code
par(bg = '#E4D5C9',
    mar = c(0,0,1,0))

plot(kde_adaptive_kernel,
     main = "Adaptive Kernel Density Estimate")

kde_adaptive_knn <- adaptive.density(acc_bmr_ppp.km,
                                     method = "nearest",
                                     k = 100)
Show the code
par(bg = '#E4D5C9',
    mar = c(0,0,1,0))

plot(kde_adaptive_knn,
     main = "Nearest-Neighbour Adaptive Density Estimate")

5.4.1 Choosing Adaptive KDE Method

Just as we did with the fixed bandwidth, we can create histograms to compare the distribution of KDE values obtained from the density() function using different adaptive bandwidth selection methods.

par(bg = '#E4D5C9', 
    mar = c(2,2,2,2),
    mfrow = c(2,2))

hist(kde_adaptive_vd, main = "Voronoi-Dirichlet Adaptive")
hist(kde_adaptive_kernel, main = "Adaptive Kernel")
hist(kde_adaptive_knn, main = "Nearest-Neighbour Adaptive")

5.5 Plotting Interactive KDE Maps

raster_kde_fixed_scott <- raster(kde_fixed_scott)
raster_kde_adaptive_nn <- raster(kde_adaptive_knn)
raster_kde_adaptive_kernel <- raster(kde_adaptive_kernel)

projection(raster_kde_fixed_scott) <- CRS("+init=EPSG:32647 +units=km")
projection(raster_kde_adaptive_nn) <- CRS("+init=EPSG:32647 +units=km")
projection(raster_kde_adaptive_kernel) <- CRS("+init=EPSG:32647 +units=km")
Show the code
tmap_mode('view')
kde_fixed_scott <- tm_basemap(server = "OpenStreetMap") +
  tm_shape(raster_kde_fixed_scott) +
  tm_raster("layer",
            n = 10,
            title = "KDE_Fixed_scott",
            alpha = 0.6,
            palette = c("#f9edd1","#feb7c9","#f775a9", "#bb8bdc","#021c9e")) +
  tm_shape(adminboundary_bmr)+
  tm_polygons(alpha=0.1,id="ADM1_EN")+
  tmap_options(check.and.fix = TRUE)

kde_adaptive_nn <- tm_basemap(server = "OpenStreetMap") +
  tm_shape(raster_kde_adaptive_nn) +
  tm_raster("layer",
            n = 7,
            title = "KDE_Adaptive_nn",
            style = "pretty",
            alpha = 0.6,
            palette = c("#f9edd1","#feb7c9","#f775a9", "#bb8bdc","#021c9e")) +
  tm_shape(adminboundary_bmr)+
  tm_polygons(alpha=0.1,id="ADM1_EN")+
  tmap_options(check.and.fix = TRUE)

kde_adaptive_kernel <- tm_basemap(server = "OpenStreetMap") +
  tm_shape(raster_kde_adaptive_kernel) +
  tm_raster("layer",
            n = 7,
            title = "KDE_Adaptive_Kernel",
            style = "pretty",
            alpha = 0.6,
            palette = c("#f9edd1","#feb7c9","#f775a9", "#bb8bdc","#021c9e")) +
  tm_shape(adminboundary_bmr)+
  tm_polygons(alpha=0.1,id="ADM1_EN")+
  tmap_options(check.and.fix = TRUE)

tmap_arrange(kde_fixed_scott, 
             kde_adaptive_nn, 
             kde_adaptive_kernel,
             ncol=1,
             nrow=3,
             sync = TRUE)
Show the code
tmap_mode('plot')

5.6 Province-Level KDE

For a more detailed look, we will create province-area level KDE maps for 2 provinces identified. In order to create such maps, we will carry out additional data wrangling as required.

Firstly, we will filter out different planning areas as separate sf objects from adminboundary_bmr.

bkk = adminboundary_bmr %>% filter(ADM1_EN == "Bangkok")
spk = adminboundary_bmr %>% filter(ADM1_EN == "Samut Prakan")

par(bg = '#E4D5C9',
    mar = c(1,1,1,0),
    mfrow=c(1,2))
plot(st_geometry(bkk), main = "Bangkok")
plot(st_geometry(spk), main = "Samut Prakan")

Next, we will create owin objects to represent the observation windows for respective planning area. Once owin objects are created, we will also filter accident locations in each observation window from the original acc_bmr_ppp ppp object.

bkk_owin = as.owin(bkk)
spk_owin = as.owin(spk)

acc_bkk_ppp = acc_bmr_ppp[bkk_owin]
acc_spk_ppp = acc_bmr_ppp[spk_owin]

Now that we have prepared both owin and ppp objects for each planning area, we are ready to plot KDE maps. Similar to what we have done in previous section, we will try both fixed-bandwidth and adaptive bandwidth KDE maps.

5.6.1 Province-level Fixed Bandwidth KDE Maps

bkk_kde_scott <- density(acc_bkk_ppp, sigma=bw.scott, main="Bangkok")
spk_kde_scott <- density(acc_spk_ppp, sigma=bw.scott, main="Samut Prakan")

par(bg = '#E4D5C9', 
    mar = c(1,1,1,1.5),
    mfrow = c(1,2))

plot(bkk_kde_scott,
     main = "Fixed KDE - Bangkok")
contour(bkk_kde_scott, 
        add=TRUE)

plot(spk_kde_scott,
     main = "Fixed KDE - Samut Prakan")
contour(spk_kde_scott, 
        add=TRUE)

5.6.2 Province-level Adaptive-Bandwidth KDE Maps

bkk_kde_adaptive_kernel <- adaptive.density(acc_bkk_ppp, method = "kernel")
spk_kde_adaptive_kernel <- adaptive.density(acc_spk_ppp, method = "kernel")


par(bg = '#E4D5C9', mar = c(1,1,1,1.5),mfrow = c(1,2))

plot(bkk_kde_adaptive_kernel,
     main = "Adaptive KDE - Bangkok")
contour(bkk_kde_adaptive_kernel, 
        add=TRUE)

plot(spk_kde_adaptive_kernel,
     main = "Adaptive KDE - Samut Prakan")
contour(spk_kde_adaptive_kernel, 
        add=TRUE)

6 Network Constrained Kernel Density Estimation (NKDE)

7 Temporal Network Kernel Density Estination (TNKDE)

8 Conclusion

9 References

The analysis approach will be threefold. Firstly, it compares temperature differentials between urban and rural landscapes. Subsequently, it investigates variations in temperature between wet and dry seasons. Lastly, it delves into the comparison of temperatures across different years.

flowchart TD
    A[MSS Temperature Records] --> A1[Urban vs Rural] 
    A1 -.-> A11[Urban: Changi]
    A1 -.-> A12[Rural: Tengah]
    A --> A2[Seasons]
    A2 -.-> A21[Dry: June]
    A2 -.-> A22[Wet: December]
    A --> A3[Across Years]
    A3 -.-> A31[1983 to 2023]